Crime impacts communities.
This analysis explores the data collected by the Los Angeles County Sheriff’s Department (LACSD) from the years 2004 to 2015. This data was exported from LACSD’s RMS (Records Management System). The data analyzed is located at https://data.lacounty.gov/Criminal/LA-SHERIFF-CRIMES-FROM-2004-TO-2015/3dxh-c6jw
The following are the findings from the analysis:
Load packages that are needed for the processing of the data, generation of the graphical figures and analysis of the data.
library(plyr)
library(dplyr)
library(ggplot2)
library(grid)
library(gridExtra)
library(ggmap)
Define functions that are needed.
Download and read the LACSD Crime Data file into memory.
sourceUrl <- "https://data.lacounty.gov/api/views/3dxh-c6jw/rows.csv?accessType=DOWNLOAD"
targetFile <- "LA_SHERIFF_CRIMES_FROM_2004_TO_2015.csv"
# if we haven't already downloaded the file, download it
if (!file.exists(targetFile))
download.file(sourceUrl, destfile = targetFile)
# read the file into memory making empty fields NA
crimeData <- read.csv(targetFile, na.strings=c("", "NA"))
Quickly review the data
summary(crimeData)
## CRIME_DATE CRIME_YEAR CRIME_CATEGORY_NUMBER
## 07/02/2006 12:07:00 AM: 200 Min. :2004 Min. : 1.00
## 07/02/2007 12:07:00 AM: 171 1st Qu.:2006 1st Qu.: 6.00
## 07/01/2008 12:07:00 PM: 165 Median :2009 Median :13.00
## 07/02/2005 12:07:00 AM: 164 Mean :2009 Mean :13.61
## 12/27/2006 03:12:51 PM: 160 3rd Qu.:2012 3rd Qu.:23.00
## 01/01/2004 12:01:00 AM: 152 Max. :2015 Max. :30.00
## (Other) :2332637
## CRIME_CATEGORY_DESCRIPTION STATISTICAL_CODE
## LARCENY THEFT :407492 Min. : 11.0
## VEHICLE / BOATING LAWS:350628 1st Qu.: 92.0
## NARCOTICS :268700 Median :184.0
## BURGLARY :177989 Mean :197.3
## VANDALISM :172087 3rd Qu.:261.0
## GRAND THEFT AUTO :170242 Max. :695.0
## (Other) :786511
## STATISTICAL_CODE_DESCRIPTION
## VEHICLE AND BOATING LAWS: Misdemeanor : 276216
## GRAND THEFT VEHICLE (GTA): Automobile/Passenger Van: 137433
## VEHICLE BURGLARY: Auto/Passenger Van Burglary : 105926
## VANDALISM MISD : 78628
## ASSAULT, NON-AGG: Hands, Feet, Fist, Etc. : 75914
## NARCOTICS: Marijuana Misdemeanors (Less Than 1 oz) : 71139
## (Other) :1588393
## VICTIM_COUNT STREET
## Min. : 1.000 450 BAUCHET ST : 13875
## 1st Qu.: 1.000 29340 THE OLD ROAD : 8056
## Median : 1.000 440 BAUCHET ST : 6879
## Mean : 1.027 1000 UNIVERSAL STUDIOS BLVD: 3285
## 3rd Qu.: 1.000 20700 S AVALON BLVD : 3189
## Max. :53.000 (Other) :2265848
## NA's : 32517
## CITY STATE ZIP
## LOS ANGELES: 267601 CA :2310697 Min. : 9
## LANCASTER : 189984 NV : 135 1st Qu.:90260
## COMPTON : 141579 TX : 95 Median :90706
## PALMDALE : 137426 NY : 86 Mean :90971
## CARSON : 86702 FL : 76 3rd Qu.:91384
## (Other) :1488451 (Other): 541 Max. :98807
## NA's : 21906 NA's : 22019 NA's :1088737
## LATITUDE LONGITUDE GANG_RELATED REPORTING_DISTRICT
## Min. :-148729882 Min. :-274.5 N:2243404 2610 : 24976
## 1st Qu.: 34 1st Qu.:-118.3 Y: 90245 2608 : 22644
## Median : 34 Median :-118.2 2607 : 20919
## Mean : -15704 Mean :-118.2 2611 : 18630
## 3rd Qu.: 34 3rd Qu.:-118.1 1137 : 18450
## Max. : 47 Max. : 37.8 1335 : 18091
## NA's :139917 NA's :139917 (Other):2209939
## STATION_IDENTIFIER STATION_NAME CRIME_IDENTIFIER
## CA0190013: 206937 LAKEWOOD : 206937 Min. :12354772
## CA0190024: 198451 LANCASTER: 198451 1st Qu.:13721760
## CA01900V3: 176148 CENTURY : 176148 Median :14793037
## CA01900W9: 152258 PALMDALE : 152258 Mean :14868051
## CA0190004: 146373 NORWALK : 146373 3rd Qu.:16075134
## CA0190042: 141182 COMPTON : 141182 Max. :17659931
## (Other) :1312300 (Other) :1312300
## GEO_CRIME_LOCATION
## (34.05913144836631699721, -118.23115765035327852035): 7489
## (34.05914118481381540764, -118.23162950988520228384): 5802
## (34.45160990721858960867, -118.61520280236499624484): 4042
## (34.05914139622937544171, -118.23106492756480129503): 3478
## (34.05926873298660148271, -118.23148712692719813448): 2501
## (Other) :2179297
## NA's : 131040
Tidy the data. There are several issues with the data. Zip codes are missing in several row. Latitude and Longitude are also missing in several rows. Several Latitude and Longitude points are outside the LAC area, with points as far north as Bakersfield, and as far south as San Diego. Since the final output will use Lat/Long, rows with missing values will be removed as well as the points outside the LAC boundaries. The zip code column will be removed entirely.
# remove columns not needed
crimeData <- subset(crimeData, select= -c(ZIP,GEO_CRIME_LOCATION))
# remove rows where NA is in LAT or Long
crimeData <- crimeData[complete.cases(crimeData[,c("LATITUDE","LONGITUDE")]),]
# remove rows where LAT and/or Long are outside the LAC area
crimeData <- crimeData[crimeData$LATITUDE <= 34.337306,]
crimeData <- crimeData[crimeData$LATITUDE >= 33.703652,]
crimeData <- crimeData[crimeData$LONGITUDE >= -118.668176,]
crimeData <- crimeData[crimeData$LONGITUDE <= -118.155289,]
summary(crimeData)
## CRIME_DATE CRIME_YEAR CRIME_CATEGORY_NUMBER
## 02/24/2009 10:02:00 AM: 108 Min. :2004 Min. : 1.00
## 07/01/2008 08:07:00 AM: 68 1st Qu.:2006 1st Qu.: 6.00
## 05/24/2006 04:05:10 PM: 52 Median :2009 Median :13.00
## 07/01/2008 12:07:00 PM: 52 Mean :2009 Mean :13.42
## 01/01/2007 12:01:00 AM: 48 3rd Qu.:2012 3rd Qu.:23.00
## 01/23/2008 09:01:00 AM: 48 Max. :2015 Max. :30.00
## (Other) :877343
## CRIME_CATEGORY_DESCRIPTION STATISTICAL_CODE
## LARCENY THEFT :130820 Min. : 11.0
## VEHICLE / BOATING LAWS :122264 1st Qu.: 91.0
## NARCOTICS :116875 Median :182.0
## GRAND THEFT AUTO : 71471 Mean :188.8
## NON-AGGRAVATED ASSAULTS: 66890 3rd Qu.:255.0
## BURGLARY : 59556 Max. :695.0
## (Other) :309843
## STATISTICAL_CODE_DESCRIPTION
## VEHICLE AND BOATING LAWS: Misdemeanor : 99211
## GRAND THEFT VEHICLE (GTA): Automobile/Passenger Van : 61632
## VEHICLE BURGLARY: Auto/Passenger Van Burglary : 37201
## ASSAULT, NON-AGG: Hands, Feet, Fist, Etc. : 36700
## Felony Transport. &/or Sale of Controlled Substance(excpt Marijuana): 32108
## Felony Possession of a Controlled Substance (excluding Marijuana) : 26752
## (Other) :584115
## VICTIM_COUNT STREET
## Min. : 1.000 450 BAUCHET ST : 13833
## 1st Qu.: 1.000 440 BAUCHET ST : 6868
## Median : 1.000 1000 UNIVERSAL STUDIOS BLVD: 3220
## Mean : 1.033 20700 S AVALON BLVD : 3186
## 3rd Qu.: 1.000 11710 S ALAMEDA ST : 2698
## Max. :53.000 7100 SANTA MONICA BLVD : 2288
## (Other) :845626
## CITY STATE LATITUDE LONGITUDE
## LOS ANGELES :245297 CA :877676 Min. :33.71 Min. :-118.7
## COMPTON :136451 AZ : 2 1st Qu.:33.89 1st Qu.:-118.3
## CARSON : 83428 MN : 2 Median :33.93 Median :-118.2
## LYNWOOD : 65795 CO : 1 Mean :33.95 Mean :-118.3
## WEST HOLLYWOOD: 59434 FL : 1 3rd Qu.:34.02 3rd Qu.:-118.2
## PARAMOUNT : 37124 (Other): 7 Max. :34.34 Max. :-118.2
## (Other) :250190 NA's : 30
## GANG_RELATED REPORTING_DISTRICT STATION_IDENTIFIER
## N:833488 2112 : 12962 CA01900V3:168899
## Y: 44231 5100 : 11649 CA0190042:133980
## 1624 : 10820 CA0190016:104087
## 2116 : 10818 CA0190003: 86143
## 0972 : 10555 CA0190002: 73861
## 0977 : 10283 CA0190009: 66352
## (Other):810632 (Other) :244397
## STATION_NAME CRIME_IDENTIFIER
## CENTURY :168899 Min. :12354773
## COMPTON :133980 1st Qu.:13788514
## CARSON :104087 Median :14871754
## SOUTH LOS ANGELES: 86143 Mean :14920747
## EAST LOS ANGELES : 73861 3rd Qu.:16127086
## WEST HOLLYWOOD : 66352 Max. :17659931
## (Other) :244397
A simple plot of all the points in LAC.
#Using GGPLOT, plot the City Map for all years
lac <- get_map(location=c(lon=-118.411732, lat=34.020479), zoom="auto", maptype="roadmap")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=34.020479,-118.411732&zoom=10&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
map = ggmap(lac)
mapPoints <- map + geom_point(data=crimeData, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints
We can make the Shiny map show crimes for the years from 2004 to 2015.
crime2004 <- crimeData[crimeData$CRIME_YEAR == 2004,]
crime2005 <- crimeData[crimeData$CRIME_YEAR == 2005,]
crime2006 <- crimeData[crimeData$CRIME_YEAR == 2006,]
crime2007 <- crimeData[crimeData$CRIME_YEAR == 2007,]
crime2008 <- crimeData[crimeData$CRIME_YEAR == 2008,]
crime2009 <- crimeData[crimeData$CRIME_YEAR == 2009,]
crime2010 <- crimeData[crimeData$CRIME_YEAR == 2010,]
crime2011 <- crimeData[crimeData$CRIME_YEAR == 2011,]
crime2012 <- crimeData[crimeData$CRIME_YEAR == 2012,]
crime2013 <- crimeData[crimeData$CRIME_YEAR == 2013,]
crime2014 <- crimeData[crimeData$CRIME_YEAR == 2014,]
crime2015 <- crimeData[crimeData$CRIME_YEAR == 2015,]
summary(crime2004)
## CRIME_DATE CRIME_YEAR CRIME_CATEGORY_NUMBER
## 01/01/2004 12:01:00 AM: 47 Min. :2004 Min. : 1.00
## 02/09/2004 08:02:00 AM: 38 1st Qu.:2004 1st Qu.: 6.00
## 07/01/2004 12:07:00 PM: 34 Median :2004 Median :13.00
## 04/14/2004 01:04:00 PM: 33 Mean :2004 Mean :13.19
## 01/26/2004 08:01:00 AM: 30 3rd Qu.:2004 3rd Qu.:23.00
## 07/02/2004 12:07:00 AM: 28 Max. :2004 Max. :30.00
## (Other) :72480
## CRIME_CATEGORY_DESCRIPTION STATISTICAL_CODE
## LARCENY THEFT :10736 Min. : 11.0
## VEHICLE / BOATING LAWS :10232 1st Qu.: 91.0
## NARCOTICS : 7576 Median :181.0
## GRAND THEFT AUTO : 7327 Mean :186.6
## NON-AGGRAVATED ASSAULTS: 5306 3rd Qu.:255.0
## BURGLARY : 5119 Max. :612.0
## (Other) :26394
## STATISTICAL_CODE_DESCRIPTION
## VEHICLE AND BOATING LAWS: Misdemeanor : 8503
## GRAND THEFT VEHICLE (GTA): Automobile/Passenger Van : 6403
## VEHICLE BURGLARY: Auto/Passenger Van Burglary : 3723
## ASSAULT, NON-AGG: Hands, Feet, Fist, Etc. : 2996
## VANDALISM MISD : 2680
## Felony Transport. &/or Sale of Controlled Substance(excpt Marijuana): 2513
## (Other) :45872
## VICTIM_COUNT STREET
## Min. : 1.000 450 BAUCHET ST : 1069
## 1st Qu.: 1.000 440 BAUCHET ST : 812
## Median : 1.000 1000 UNIVERSAL STUDIOS BLVD: 427
## Mean : 1.036 710 S LONG BEACH BLVD : 228
## 3rd Qu.: 1.000 20700 S AVALON BLVD : 214
## Max. :19.000 100 UNIVERSAL CITY PLZ : 200
## (Other) :69740
## CITY STATE LATITUDE LONGITUDE
## LOS ANGELES :17470 CA :72688 Min. :33.72 Min. :-118.7
## COMPTON :11645 PA : 1 1st Qu.:33.89 1st Qu.:-118.3
## CARSON : 7510 AK : 0 Median :33.92 Median :-118.2
## LYNWOOD : 5234 AL : 0 Mean :33.95 Mean :-118.3
## WEST HOLLYWOOD : 4691 AR : 0 3rd Qu.:34.02 3rd Qu.:-118.2
## EAST LOS ANGELES: 4058 (Other): 0 Max. :34.33 Max. :-118.2
## (Other) :22082 NA's : 1
## GANG_RELATED REPORTING_DISTRICT STATION_IDENTIFIER
## N:67996 2112 : 1294 CA01900V3:13633
## Y: 4694 5100 : 1230 CA0190042:11499
## 1624 : 911 CA0190016: 9331
## 2170 : 894 CA0190003: 6644
## 0977 : 863 CA0190009: 5877
## 2846 : 856 CA0190002: 5206
## (Other):66642 (Other) :20500
## STATION_NAME CRIME_IDENTIFIER
## CENTURY :13633 Min. :12354773
## COMPTON :11499 1st Qu.:12471805
## CARSON : 9331 Median :12697530
## SOUTH LOS ANGELES: 6644 Mean :12680362
## WEST HOLLYWOOD : 5877 3rd Qu.:12824914
## EAST LOS ANGELES : 5206 Max. :17069979
## (Other) :20500
summary(crime2015)
## CRIME_DATE CRIME_YEAR CRIME_CATEGORY_NUMBER
## 04/15/2015 12:04:00 PM: 24 Min. :2015 Min. : 1.00
## 02/26/2015 12:02:00 PM: 21 1st Qu.:2015 1st Qu.: 6.00
## 04/08/2015 12:04:00 PM: 21 Median :2015 Median :13.00
## 10/26/2015 03:10:00 PM: 21 Mean :2015 Mean :12.92
## 02/13/2015 08:02:00 PM: 20 3rd Qu.:2015 3rd Qu.:22.00
## 03/18/2015 01:03:00 PM: 20 Max. :2015 Max. :30.00
## (Other) :62807
## CRIME_CATEGORY_DESCRIPTION STATISTICAL_CODE
## LARCENY THEFT :11326 Min. : 11.0
## VEHICLE / BOATING LAWS : 8236 1st Qu.: 91.0
## NON-AGGRAVATED ASSAULTS: 6702 Median :183.0
## NARCOTICS : 5484 Mean :193.3
## GRAND THEFT AUTO : 5316 3rd Qu.:261.0
## VANDALISM : 4473 Max. :695.0
## (Other) :21397
## STATISTICAL_CODE_DESCRIPTION
## VEHICLE AND BOATING LAWS: Misdemeanor : 5975
## GRAND THEFT VEHICLE (GTA): Automobile/Passenger Van : 4511
## ASSAULT, NON-AGG: Hands, Feet, Fist, Etc. : 3936
## VEHICLE BURGLARY: Auto/Passenger Van Burglary : 2771
## Misdemeanor Possessn of a Controlled Substance (excluding Marijuana): 2715
## ASSAULT, NON-AGGRAVATED: DOMESTIC VIOLENCE : 2116
## (Other) :40910
## VICTIM_COUNT STREET
## Min. :1.000 450 BAUCHET ST : 1849
## 1st Qu.:1.000 11710 S ALAMEDA ST : 442
## Median :1.000 7100 SANTA MONICA BLVD : 257
## Mean :1.039 440 BAUCHET ST : 218
## 3rd Qu.:1.000 20700 S AVALON BLVD : 216
## Max. :9.000 1000 UNIVERSAL STUDIOS BLVD: 201
## (Other) :59751
## CITY STATE LATITUDE LONGITUDE
## LOS ANGELES :21367 CA :62932 Min. :33.72 Min. :-118.7
## COMPTON : 8126 FL : 1 1st Qu.:33.89 1st Qu.:-118.3
## CARSON : 5479 AK : 0 Median :33.93 Median :-118.2
## LYNWOOD : 4267 AL : 0 Mean :33.96 Mean :-118.3
## WEST HOLLYWOOD: 3859 AR : 0 3rd Qu.:34.03 3rd Qu.:-118.2
## PARAMOUNT : 2579 (Other): 0 Max. :34.34 Max. :-118.2
## (Other) :17257 NA's : 1
## GANG_RELATED REPORTING_DISTRICT STATION_IDENTIFIER
## N:61336 5100 : 1276 CA01900V3:10449
## Y: 1598 5800 : 1199 CA0190042: 7829
## 2112 : 778 CA0190002: 7112
## 0972 : 724 CA0190016: 7033
## 0977 : 720 CA0190003: 6276
## 1624 : 673 CA0190009: 4228
## (Other):57564 (Other) :20007
## STATION_NAME CRIME_IDENTIFIER
## CENTURY :10449 Min. :14853777
## COMPTON : 7829 1st Qu.:17067434
## EAST LOS ANGELES : 7112 Median :17154730
## CARSON : 7033 Mean :17238295
## SOUTH LOS ANGELES: 6276 3rd Qu.:17461241
## WEST HOLLYWOOD : 4228 Max. :17659931
## (Other) :20007
The summary shows that overall crime has gone down from 2004 to 2015. Great Job LAC!
Plots to compare between 2004 and 2015.
#Using GGPLOT, plot the City Map for 2004 only
mapPoints2004 <- map + geom_point(data=crime2004, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2004
mapPoints2005 <- map + geom_point(data=crime2005, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2005
mapPoints2006 <- map + geom_point(data=crime2006, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2006
mapPoints2007 <- map + geom_point(data=crime2007, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2007
mapPoints2008 <- map + geom_point(data=crime2008, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2008
mapPoints2009 <- map + geom_point(data=crime2009, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2009
#Using GGPLOT, plot the City Map for 2015 only
mapPoints2010 <- map + geom_point(data=crime2010, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2010
mapPoints2011 <- map + geom_point(data=crime2011, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2011
mapPoints2012 <- map + geom_point(data=crime2012, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2012
mapPoints2013 <- map + geom_point(data=crime2013, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2013
mapPoints2014 <- map + geom_point(data=crime2014, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2014
mapPoints2015 <- map + geom_point(data=crime2015, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints2015
There is a strange concentration of color, zooming in we find that the crime type is at LAX airport.
#Using GGPLOT, plot the City Map for all years
lac <- get_map(location=c(lon=-118.411732, lat=33.950479), zoom=14, maptype="roadmap")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=33.950479,-118.411732&zoom=14&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
map = ggmap(lac)
mapPoints <- map + geom_point(data=crimeData, aes(x=LONGITUDE, y=LATITUDE, colour=factor(CRIME_CATEGORY_DESCRIPTION)), alpha=.5, size=1) + theme(legend.position = "none")
mapPoints
## Warning: Removed 874789 rows containing missing values (geom_point).
saveToFile <- "CrimeData.csv"
saveToFileSmall <- "CrimeDataSmall.csv"
if (!file.exists(saveToFile))
write.table(crimeData, saveToFile, sep=",")
if (!file.exists(saveToFileSmall))
{
smallCrime <- crimeData[crimeData$CRIME_YEAR >= 2011,]
write.table(smallCrime, saveToFileSmall, sep=",")
}